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Abstract 

Transposable elements (TEs) are a class of mobile genetic elements (MGEs) that were long regarded as junk DNA, 
which make up approximately 45% of the genome. Although most of these elements are rendered inactive by 
mutations and other gene silencing mechanisms, TEs such as long interspersed nuclear elements (LINEs) are still 
active and translocate within the genome. During transposition, they may create lesions in the genome, thereby 
acting as epigenetic modifiers. Approximately 65 disease-causing LINE insertion events have been reported thus far; 
however, any possible role of TEs in complex disorders is not well established. Chronic obstructive pulmonary 
disease (COPD) is one such complex disease that is primarily caused by cigarette smoking. Although the exact 
molecular mechanism underlying COPD remains unclear, oxidative stress is thought to be the main factor in the 
pathogenesis of COPD. In this review, we explore the potential role of oxidative stress in epigenetic activation of 
TEs such as LINEs and the subsequent cascade of molecular damage. Recent advancements in sequencing and 
computation have eased the identification of mobile elements. Therefore, a comparative study on the activity of 
these elements and markers for genome instability would give more insight on the relationship between MGEs and 
complex disorder such as COPD. 
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A) Transposable elements and their mobility 

Transposable elements (TEs) account for nearly half 
(approximately 45%) of the human genome, which is in 
contrast to the functional genes that constitute a smaller 
proportion (approximately 5%) of the human genome 
[1]. Based on the mechanism of transposition, TEs are 
classified as class 2 elements or DNA transposons (cut 
and paste' mechanism of DNA intermediates) and class 
1 elements or retrotransposons (copy and paste' me- 
chanism of RNA intermediates) [1,2]. Of these, retro- 
transposons are the most important TEs because they 
can amplify and increase the host genome size. This 
ability to move enables class 1 elements to strongly 
affect genome evolution. Retrotransposons are further 
subdivided into long terminal repeat (LTR) elements and 
non-LTR elements. 
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Long interspersed nuclear elements (LINEs) are non- 
LTR elements that lack LTRs at their ends. Most LINEs 
belong to the LINE-1 (or LI) family and are the only 
TEs capable of transposing autonomously, which cons- 
titute approximately 17% of the human genome [1,3]. 
Although majority of Lis are rendered inactive as mo- 
lecular fossils by 5' truncations and inversions [4], there 
are still approximately 80-100 active retro transposition- 
competent Lis (RC-Lls). An active LI is approxi- 
mately 6 kb in length, containing a 5'-UTR, 2 open 
reading frames (ORF1 and ORF2) and a 3'-UTR with 
the characteristic poly (A) tail (Figure la) [3,5]. LI 
elements either have cis or trans preference [6]. Pro- 
teins coded by Lis with cis preference (ORFlp and 
ORF2p) act on other LI RNAs to aid nuclear import 
and integration into the genome (Figures lb, 2a) [7]. 
Proteins coded by Lis with trans preference assists in 
the translocation of other non-autonomous elements 
such as short interspersed nuclear elements (SINEs) 
(Figure la) [6]. 
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Figure 1 a) List of mobile elements, structure and its distribution in the human genome. Reference sequence (HGR) and their structure 
along with examples (Italicized) Abbreviations: IR (Inverted repeats); LTR (Long Terminal Repeat); Gag (Group-specific antigen); Pol (Polymerase); 
Env (Envelope protein); UTR (Untranslated region); EN (Endonuclease); RT (Reverse transcriptase); C (Cysteine - rich domain); ORF (Open reading 
frame); A n (Poly (A) tail); A & B (Sequences of RNA pol III promoter); Ins (Insertional sequence); TSD (Target site duplication); VNTR (Variable 
number of tandem repeats); SINE-R (domain derived from previous translocation). Figure 1b) Mobile element insertion by Target Primed Reverse 
Transcription (TPRT) method, i) Endonuclease (EN) coded by transposons cleaves the first DNA strand of the target site; ii) Cleavage of the 
second DNA strand; iii) L1 RNA anneals to the nick site; iv) Reverse transcription is initiated by retrotransposons coded reverse transcriptase (RT); 
v) Integration; vi) DNA synthesis resulting in the new insert with target site duplications at the flanks of newly integrated region. 



B) Transposition and genome instability 

Genome integrity is a crucial determinant in passing 
down genetic information from one generation to an- 
other. TE-associated genetic alterations such as aberrant 
mRNA splicing, introduction of premature stop codon 
and transcriptional disruptions threaten this integrity. 
Double-stranded breaks (DSBs) generated by TEs [8] 
produce tracts of non-allelic sequences that can derange 
major homology-based repair system (homologous re- 
combination repair, HRR). This in turn can result in 
large-scale insertion/deletions (INDELS), inversions and 
chromosomal rearrangements through non-allelic hom- 
ologous recombination (NAHR) [9]. Thus far, more than 
25 insertion-mediated disorders have been reported [10]. 
Furthermore, TEs play a crucial role in the genesis of 
structural variations such as microsatellites repeats [11]. 
For instance, before integration, Lis and SINEs undergo 
3' extension to generate a 3'-A-rich tail [3,5], which 
directs further integration of TEs [12]. These newly in- 
tegrated retrotransposons can readily mutate to pro- 
microsatellite sequences and turn in to highly unstable 
structures by processes such as polymerase slippage [13], 
resulting in microsatellite instability (MSI). Such an as- 
sociation has been reported in microsatellite-initiating 
mobile elements (mini-me) of dipteran taxa [14] that 



carry pro-microsatellite sequences. After the insertion of 
mini-me into the genome, slippage-associated mutation 
introduces variation in these loci to generate micro- 
satellites. The mechanism observed in dipteran genomes 
seems to be common among eukaryotes where elements 
with cryptic repeats tend to decay into microsatellites 
through insertion-mediated mutations [14]. 

Microsatellites exhibit high mutation rate compared to 
point mutations, which makes them a potent regulator 
of gene expression [15]. MSI, a type of genomic ins- 
tability, is a modulator in several malignant and benign 
diseases caused by the instability in tandem repeats 
(2-6 bp) of microsatellites [16]. MSI is studied by amp- 
lifying microsatellites that are proximal to a putative 
gene and examining the shift in electrophoretic pattern 
caused by the addition or deletion of repetitive units 
[17]. Genetic studies on MSI have already shown its im- 
plications as acquired mutations in benign lung condi- 
tions [18] and as a potential marker for asthma, chronic 
obstructive pulmonary disease (COPD) and idiopathic 
pulmonary fibrosis [17,19,20]. Epithelial cells lining the 
trachea, bronchi and bronchioles of the lungs are prone 
to such mutations [21]. These mutations can persist 
even after smoking cessation, possibly explaining the 
non- intractable inflammation condition in ex-smokers. 
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Figure 2 a. Life cycle of LI retrotransposon. i) Transcription; L1 life cycle starts with the transcription of active L1 in the genome by 
recruitment of transcription factors, followed by polyadenylation and splicing to form L1 RNA, which is nuclear exported, ii) Translation; 
Active L1RNA codes for the 0RF1 and 0RF2 protein that binds with other retrotranscription competent L1 (RC-L1) RNA to form L1RNP 
(Ribonucleoprotein) complex, which is nuclear imported for retrotransposition. iii) Insertional events; results in DSBs by the activity of L1 0RF2 
endonuclease followed by, iv) integration; lesions created by L10RF2 activity is repaired and integrated in to the genome by TPRT. v) Heavy 
metals and other smoke particles can interact with L1 lifecycle either at the early stages by altering the methylation profile (epigenetic alteration) 
resulting in active L1 or at the late repair stages by impairing repair pathway resulting in somatic mutation accumulation {Granulated cells). 
b. Effect of somatic mutation accumulation on disease onset and exacerbation. Mutated somatic cells are recognized by the host system as 
foreign cells and are presented by antigen presenting cells (APCs) triggering a cascade of pathways involving T helper cells (Th) and cytotoxic 
T cells (Tc), which migrates to the infected site and releases various transmitters inducing cell death. Failure in effective efferocytosis results in 
aberrant remodeling of the structure and the characteristic onset of COPD. Mutant cells can interact with transcription factors to increase 
the release of cytokines and the consequent recruitment of inflammatory cells thereby destabilizing the immune balance and manifest 
the features of COPD. 

^ ) 



Studies on bronchial epithelium of smokers [22] further 
validate this theory of epithelium cells as the prime cells 
of MSI activity. Furthermore, MSI is significantly associ- 
ated with exacerbation frequency in patients with COPD 
[23]. COPD exacerbation is caused by the acute worsen- 
ing of respiratory symptoms along with physiological de- 
teriorations. Because its frequency is related to disease 
severity [24], the possible role of MSI in regulating this 
frequency should be an interesting avenue to study. 

C) Transposable elements and complex lung 
disorder 

COPD is a complex lung disorder and is the leading 
cause of morbidity and mortality. The 2011 WHO es- 
timates indicate that 64 million people have COPD; 
moreover, COPD is reported to cause 3 million deaths 
worldwide, making it the fifth leading cause of death 
worldwide [25]. COPD manifests as co-occurrence of 
conditions such as chronic bronchitis (inflammation of 
the bronchi) and emphysema (alveolar wall destruction) 



[26]. Cigarette smoking is the most common cause of 
COPD and is associated with inflammation, high cell 
turnover and oxidative stress, leading to proteolytic 
damage of the lungs. Nearly all smokers develop inflam- 
mation, but only a fraction (10%- 15%) develop COPD 
and even fewer (l%-3%) develop lung cancer [21]. This 
peculiar distribution urges one to postulate that acquired 
(somatic) mutations may be a prerequisite in the pa- 
thobiology of COPD. Estimates show that genetic alte- 
rations accounts for up to 50% of COPD cases [27]. 
Marked variability in the development of airflow ob- 
struction among smokers [28], familial aggregation of 
pulmonary function in monozygotic and dizygotic twins 
[29], and differences in clinical outcome compared with 
controls in first-degree relatives [30] are some of the 
facts that support the claim of genetic factors in COPD 
development. In addition, linkage and candidate gene 
association studies have identified an array of genetic 
determinants in the pathogenesis of COPD [26]. Al- 
though there are reports on genomic instability events 
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in complex disorders such as COPD and cancer [15,31], 
the association of these events with TE activity remains 
obscure. Therefore, it is possible that TEs such as Lis 
may play a vital role in disease phenotype by introdu- 
cing somatic mutations and thereby affecting genome 
integrity. 

TEs can be acquired as somatic mutations over a life- 
time; presence of LI activity in tumour cells but not in 
the surrounding healthy cells supports this hypothesis 
[32]. Propagation of TEs in the somatic line is facilitated 
by their expansion in germ cells or in the embryonic 
stage. In addition, retrotransposition events occurring in 
germ cells greatly increase the chance of TE propagation 
to further generations [33]. For instance, family studies 
on ocular disease show that mothers of patients exhibit 
both somatic and germline mosaicism for LI insertion 
in the disease gene, suggesting the possibility of retro- 
transposition during embryogenesis [34]. Retrotrans- 
position events occurring during developmental stages 
can create somatic mosaicism. Kano et al. (2009) studied 
such occurrences where LI RNA was found in embry- 
onic cells and adult tissues such as the lung [35] . Further 
quantitative analysis showed that frequency of retro- 
transposition was higher in somatic tissues as in repro- 
ductive cells. A recent study supports this claim because 
in this study, the level of LI RNA in the oesophagus and 
lung was same as that in HeLa cells [36] . Ever increasing 
results from molecular studies on transgenic models em- 
phasise the risk of such genetic alterations in the devel- 
opment of organs. It is possible that active LI -mediated 
retrotransposition can disrupt the genes that regulate 
lung growth in early life, resulting in developmental de- 
formity. This may further lead to lung damage by host 
machinery (protease/anti-protease imbalance) or by en- 
vironmental factors (cigarette smoking, pollutants). For 
instance, it is already known that epigenetic changes 
during lung development play a vital role in the develop- 
ment of bronchopulmonary dysplasia (BPD) [37] and 
that any associated lower lung functions can ultimately 
result in the development of COPD [38]. 

D) Epigenetics of transposable elements 

The study of heritable non-coding variations is a hot 
topic, particularly in cancer biology. DNA methylation is 
one such epigenetic regulator that plays a decisive role 
in developmental biology and pathobiology by processes 
such as X-chromosome inactivation and retrotrans- 
cription silencing [39]. Approximately one-third of the 
DNA methylation occurs in mobile elements such as 
Alu and Lis [40], thus making them inactive and surro- 
gate markers of global methylation analysis. These sites 
can be hypomethylated by environmental influences, 
leading to genome instability and altered gene expres- 
sion [41]. Reports on the association between global 



hypomethylation and genomic instability [20] suggest 
that Lis that are hypomethylated in airway epithelial 
cells are associated with higher levels of microsatellite 
instability. A recent study supports this hypothesis by 
showing the association between hypomethylation of LI 
elements and faster rate of decline in lung function mea- 
sures such as FEV1 and FVC [42]. Because lung function 
tests are a major determining factor for diagnosing lung 
disorders and measuring their severity, the impact of 
hypomethylation on lung function is intriguing. Other 
environmental factors such as wood smoke exposure 
may also contribute in this type of association [43]. 
Environmental factors are a known source of oxida- 
tive stress, and any associated epigenetic alterations at 
the microsatellite level manifests as acquired muta- 
tions, resulting in MSI incidence [44]. Such instability 
events have already been studied in COPD patients 
by examining the by-product of oxidant-DNA damage 
[8-hydroxydeoxyguanosine (8-OHdG) marker] [31]. 

E) Oxidative stress and hypomethylation 

In recent years, there has been an interest in studying 
the effects of oxidative stress on epigenetic gene regula- 
tion by DNA methylation. Oxidative stress caused by 
oxidant/anti-oxidant imbalance plays a central role in 
the pathogenesis of COPD [45]. Oxidant release results 
in the inactivation of anti-proteases, neutrophil seques- 
tration and gene expression of pro-inflammatory cyto- 
kines. Cigarette smoke is an exogenous source of such 
oxidants that contain a high proportion of free radicals, 
both in tar and gaseous phase. The smoke interacts with 
the epithelial lining fluid to form cigarette smoke con- 
densate, which in turn produces more reactive oxygen 
species [46]. In addition, under stress, inflammatory cells 
(neutrophils and macrophages) can act as endogenous 
source of oxidants, which in turn damage the compo- 
nents of lung matrix (emphysema) by proteolytic clea- 
vage [45]. 

Under oxidative conditions, GC-rich sites are highly 
susceptible, and guanine with the lowest redox potential 
[47] oxidizes to guanyl neutral radical. These neutral 
radicals react with superoxides from cigarette smoke to 
form 8-OHdG [48]. 8-OHdG, a stable oxidation product, 
inhibits the binding capacity of DNA methyltransferase, 
resulting in the demethylation of guanine [49] and cy- 
tosine residues [50]. Furthermore, 8-OHdG can cause 
transversions (G > T) that reduce methylation hotspots 
(CpG dinucleotides), leading to more hypomethylation 
[51]. Because the susceptibility to oxidative stress de- 
pends on the base composition, clusters of GC-rich CpG 
dinucleotides can serve as major targets. For instance, 
the LI mRNA is bicistronic (ORF1 and ORF2) in nature, 
with 5'-UTR having a high GC content (approximately 
60%) [52,53]. In one study on bladder cancer, patients 
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with increased oxidative stress exhibited hypomethy- 
lation of LI elements [54]. Similarly, global methylation 
analysis on lung adenocarcinoma samples showed hy- 
pomethylation of Lis that resulted in increased mobility 
and subsequent gene disruption [20]. Oxidative stress- 
induced demethylation can be a result of environmental 
factors such as smoke exposure, ageing, UV radiation 
and lifestyle factors. For instance, prenatal exposure to 
tobacco smoke is significantly associated with global 
(Lis and Alu) demethylation in adulthood [55]. In ad- 
dition, cigarette smoking along with the inhalation of 
traffic particles decreases the methylation of LI in 
blood DNA [56]. All these studies point to oxidative 
stress and its role in the methylation pattern of TEs. 
Under oxidative stress, these sites can undergo hypo- 
methylation, resulting in the activation and transpo- 
sition of Lis (Figure 2a); this can lead to deleterious 
structural alterations in the genome (mutant cells) [41] 
followed by a cascade of signalling events (Figure 2b). 
Such events can bring in cell death and/or inflammatory 
response with a continuous cycle of inflammation leading 
to continued decline of lung function. All these studies 
clearly suggest that these are not isolated events in 
the development of COPD and that oxidative stress 
mediated epigenetic changes plays a central role in 
the pathogenesis. 

F) Identification of transposable element activity 
in the genome 

Marked variability in the distribution of active TEs 
between individuals is a direct consequence of their ac- 
tivity in somatic tissues and low selection pressure en- 
countered by these elements. It enables them to evolve 
rapidly at different sites that make their identification in 
the genome arduous. Over the last 2 decades, new ap- 
proaches have been applied for identifying mobile ele- 
ments. Earlier studies mostly used previous knowledge 
of mutant genes in characterizing the mobile elements 
by cloning and sequencing [11,57], which was further 
refined by the advent of tools such as PCR [58]. The 
sheer complexity and vast distribution of these elements 
makes their identification a mammoth task, with massive 
data pouring in from new applications such as next- 
generation sequencing (NGS). 

A few of these methods such as de novo discovery and 
homology-based methods are briefly discussed. The al- 
gorithm for detecting inserts in de novo method usually 
involves reading shotgun sequence reads and matching 
the repeat sequences, followed by clustering the matched 
pairs to give a consensus sequence of a TE family [59]. 
Unlike the de novo sequencing method, homology-based 
approach uses previous knowledge of TE sequences, 
such as sequence similarity, in identifying similar class 
TEs with a low copy number. Figure 3 discusses the 



main theme of computational study in repetitive ele- 
ments; putative LI insertions are identified by compar- 
ing clusters of consensus alignment from the same 
sequence reads. A sequence pair read that is aligned to 
the reference genome is concordant; hence, discordant 
alignment that does not match paired-end expectations 
could represent novel structural variant (SV) sites [60]. 
Recent studies enhanced the sensitivity and specificity of 
this procedure by using refined versions of the algorithm 
that targets the diploid nature of the genome [61]. As a 
valuable addition to the sequence paired-end read align- 
ment, Ewing et al. (2010) used the orientation and struc- 
tural characteristics of the reads to identify 1016 novel 
LI insertions [62]. 

Research interest in SVs has increased exponentially 
over the past decade, and with the advent of screening 
technologies, approximately 5000 insertions have been 
reported thus far [63]. Because most reported inser- 
tions are scattered across other databases leading to 
redundancy, a compiled non-redundant list is emi- 
nent. Database of Retrotransposition Insertion Poly- 
morphism (DbRIP) represents a comprehensive list of 
human genome variations (SINE, Alu and LINE). Da- 
ta from published journals are collected and compiled 
into a non-redundant list of RIPs. The design of the 
database is based on simple genome browser style 
with graphical visualization of RIP for easy navigation 
and information retrieval. Classification of reported 
RIPs is based on class, family and subfamily, inclu- 
ding data on the size of insertion, chromosomal po- 
sition, disease association and PCR conditions with 
expected amplicon sizes and reference(s). Such a tool, 
with effective documentation, gives a much clearer 
picture of RIPs in the line of SNPs and CNVs. Now, 
with the advent of next-generation platform and or- 
ganized data, it is possible to study the role of these 
elements in shaping the genome structure and their 
functional impact. 

G) Summary and concluding remarks 

At least 4 principal mechanisms, inflammation, prote- 
ase-anti-protease imbalance, oxidative stress and apop- 
tosis, have been identified in the pathogenesis of COPD. 
Of these, the oxidative stress plays a pivotal role in 
COPD pathogenesis because it directly injures the res- 
piratory tract and regulates other mechanisms. 

Oxidative stress elicits inflammatory response and 
inhibits the DNA repair system in a dose-dependent 
manner that may be altered at the microsatellite level, 
resulting in genome instability. The vast distribution and 
complexity of mobile genetic elements in the genome 
makes another strong argument in genomic instability. 
In addition to acting as insertional mutagens, alterations 
such as deletions, inversion and duplication can be 
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Figure 3 General scheme of pipeline in identifying repeated sequences. {White inset boxes - Few examples of the available computational 
tools) Input query sequence data, is pre-processed by screening forTEs and the cryptic structures (poly (A) tail, degenerate primers) are trimmed 
to avoid excessive mismatches. It is then mapped against the reference genome and/or repeat region library to form clusters, for each cluster the 
programs (MAP, MAFFD constructs multiple alignments resulting in consensus sequences. Followed by a post processing step wherein the 
consensus is realigned with the reference by using characteristic TE features as filter parameters, yielding concordant (YES) or discordant combinations 
(NO). Concordant combinations are the elements that are already in the reference library while the discordant combinations are of much interest as it 
represents putative novel elements. 



attributed to the translocation of these active mobile ele- 
ments. Studies on lung barrier epithelial cells have 
proven the effect of airway inflammation and oxidative 
stress on genome instability. Upon exposure to cigarette 
smoke, barrier epithelial cells undergo epigenetic alte- 
rations that can trigger mobile elements such as Lis, 
thereby influencing multiple molecular pathways that 
enhance inflammatory signals. Novel LI sites can be 



identified by performing whole genome analysis of epi- 
thelial cell DNA from smokers (COPD), ex-smokers (no 
COPD) and healthy controls against a reference genome. 
Such new LI insertions can be compared against the 
profiles of microsatellite markers in patient samples to 
study the relationship between mobile genetic elements 
and genome instability and their potential role in a com- 
plex disorder such as COPD. 
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